fix(edit): correct case-insensitive search/replace offsets for non-length-preserving lowercasing#12754
Open
nyxst4ck wants to merge 1 commit into
Open
Conversation
…ngth-preserving lowercasing caseInsensitiveMatch located the match by calling indexOf on the fully lowercased file content, then returned that index (plus searchContent.length) as a position into the ORIGINAL content. String.prototype.toLowerCase is not length-preserving for every character (e.g. "İ" U+0130 lowercases to "i" + a combining dot, two UTF-16 units), so any such character before or within the match shifts every subsequent index. The returned slice was misaligned, and an edit_existing_file / multi-edit replacement landed at the wrong offset and corrupted the file (e.g. "const value = 1;" became "cconst value = 2;"). Fix: scan the original content and, for each candidate start, lowercase only the aligned slice, comparing against the lowercased search. Indices stay anchored to the original string and the end index is derived from the matched slice rather than the search length. Adds regression tests in findSearchMatch.vitest.ts (both an earlier character and the matched region itself changing length) and an end-to-end test in executeFindAndReplace.vitest.ts.
Contributor
|
All contributors have signed the CLA ✍️ ✅ |
Author
|
I have read the CLA Document and I hereby sign the CLA |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
caseInsensitiveMatch(the fallback strategy used byfindSearchMatch/findSearchMatches, which backedit_existing_fileand the multi-edit tool) located the match by runningindexOfon the fully lowercased file content, then returned that index — plussearchContent.length— as a position into the original content:String.prototype.toLowerCase()is not length-preserving for every character. For example"İ"(U+0130, LATIN CAPITAL LETTER I WITH DOT ABOVE) lowercases to"i"+ a combining dot (two UTF-16 units). When such a character appears before the match, every subsequent index in the lowercased string is shifted relative to the original, so the returnedstartIndex/endIndexpoint at the wrong characters. When the character is inside the match,endIndexis wrong too because it is derived from the search length rather than the matched region's actual length.Because these indices feed directly into
fileContent.substring(0, start) + new + fileContent.substring(end)inexecuteFindAndReplace, the replacement lands at the wrong offset and corrupts the file. Concrete example (file has a leadingİin a comment):produces
cconst value = 2;onmain(a duplicatedc, originaltdropped) instead ofconst value = 2;.Fix
Scan the original content and, for each candidate start position, lowercase only the slice that lines up with the lowercased search string, comparing for equality. This keeps both indices anchored to the original string, and the end index comes from the matched slice rather than
searchContent.length. Behaviour for ASCII / length-preserving input is unchanged (all 128 existing tests still pass).Checklist
Tests
Verified RED-on-
main, GREEN-with-fix by reverting the implementation while keeping the new tests.core/edit/searchAndReplace/findSearchMatch.vitest.ts— two cases: an earlier character that changes length on lowercase, and the matched region itself changing length. Both assert the returned indices slice the correctly-cased original text.core/edit/searchAndReplace/executeFindAndReplace.vitest.ts— end-to-end test asserting the file is not corrupted.prettierclean on the changed files;tscreports no errors in any changed file.Summary by cubic
Fixes case-insensitive search/replace offsets when lowercasing changes string length, preventing misaligned edits and file corruption. Updates
caseInsensitiveMatchto anchor indices to the original content and adds regression tests.endIndexfrom the matched slice, notsearchContent.length, covering cases like"İ"→"i̇".executeFindAndReplace.Written for commit 91d67a4. Summary will update on new commits.